MXPlank - The Elegant Universe Of Space-Time

Connecting To The Server To Fetch The WebPage Elements!!....

MXPlank.com

Submit Research Thesis

Electronics - MicroControllers

ScienceCasts

Earth's Magnetosphere

Elucidating The Black Holes

The Surprising Power of a Solar Storm

A Close Encounter With Jupiter

Ancient remnants deep in the Kuiper belt

The Super Fluid Core Of A Dead Neutron Star

Massive Cloud On Collision Course With Milky Way

Mysterious Objects at the Edge of the Electromagnetic Spectrum

Big Mystery in the Perseus Cluster

Spacecraft discovers thousands of doomed comets

Close Encounter with Enceladus

The Sounds Of The InterStellar Space

Search The Site

Some Useful Derivatives of Activation Functions

Most neural network learning is primarily related to gradient-descent with activation functions. For this reason, the derivatives of these activation functions are used repeatedly in this Neural Networks series in this website, and gathering them in a single place for future reference is useful. This post provides details on the derivatives of these loss functions. Later posts will extensively refer to these results.

Linear and sign activations: The derivative of the linear activation function is 1 at all places. The derivative of sign(ν) is 0 at all values of ν other than at ν=0,where it is discontinuous and non-differentiable. Because of the zero gradient and non-differentiability of this activation function, it is rarely used in the loss function even when it is used for prediction at testing time. The derivatives of the linear and sign activations are illustrated in Figure1.10(a) and (b), respectively.

Sigmoid activation: The derivative of sigmoid activation is particularly simple, when it is expressed in terms of the output of the sigmoid, rather than the input. Let o be the output of the sigmoid function with argument ν

Then, one can write the derivative of the activation as follows:
The key point is that this sigmoid can be written more conveniently in terms of the outputs:

The derivative of the sigmoid is often used as a function of the output rather than theinput. The derivative of the sigmoid activation function is illustrated in Figure1.10(c)
Tanh activation: As in the case of the sigmoid activation, the tanh activation is oftenused as a function of the output o rather than the input ν

One can then compute the gradient as follows:

One can also write this derivative in terms of the output o :

The derivative of the tanh activation is illustrated in Figure1.10(d)
ReLU and hard tanh activations: The ReLU takes on a partial derivative value of 1 for non-negative values of its argument, and 0, otherwise. The hard Tanh function takes on a partial derivative value of 1 for values of the argument in [-1,+1] and 0,otherwise. The derivatives of the ReLU and hard tanh activations are illustrated in Figure1.10(e) and (f) respectively.